Closed Set Based Discovery of Small Covers for Association Rules
نویسندگان
چکیده
In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since real-life databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers (or bases) for exact and approximate rules, adapted from lattice theory and data analysis domain. Once frequent closed itemsets – which constitute a generating set for both frequent itemsets and association rules – have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on real-life databases show that these algorithms are efficient and valuable in practice. RÉSUMÉ. Nous traitons dans cet article du problème de l’utilisabilité des règles d’association découvertes. Ce problème est primordial car, dans la plupart des cas, les jeux de données réels conduisent à plusieurs milliers de règles d’association dont la mesure de confiance est élevée. Nous proposons de nouveaux algorithmes, basés sur l’utilisation de la fermeture de la connexion de Galois, permettant d’extraire des couvertures réduites (ou bases) pour les règles d’association exactes et partielles, adaptées du domaine de la théorie des treillis et de l’analyse de données. L’approche proposée consiste à extraire les itemsets fermés fréquents – qui constituent un ensemble générateur pour les itemsets fréquents et les règles d’association – et générer ensuite ces bases sans autre accès à la base de données. Les expérimentations menées sur des bases de données réelles montrent l’efficacité et l’utilité de ces algorithmes.
منابع مشابه
Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules
Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a...
متن کاملCLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces the number of rules to be presented. In this paper, we propose an e cient algorithm, CLOSET, for mi...
متن کاملMining Constant Conditional Functional Dependencies for Improving Data Quality
This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by su...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999